15 research outputs found
Detecting parametric objects in large scenes by Monte Carlo sampling
International audiencePoint processes constitute a natural extension of Markov Random Fields (MRF), designed to handle parametric objects. They have shown efficiency and competitiveness for tackling object extraction problems in vision. Simulating these stochastic models is however a difficult task. The performances of the existing samplers are limited in terms of computation time and convergence stability, especially on large scenes. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits the Markovian property of point processes to perform the sampling in parallel. This procedure is embedded into a data-driven mechanism so that the points are distributed in the scene in function of spatial information extracted from the input data. The performances of the sampler are analyzed through a set of experiments on various object detection problems from large scenes, including comparisons to the existing algorithms. The sampler is also tested as optimization algorithm for MRF-based labeling problems
Towards the parallelization of Reversible Jump Markov Chain Monte Carlo algorithms for vision problems
Point processes have demonstrated efficiency and competitiveness when addressing object recognition problems in vision. However, simulating these mathematical models is a difficult task, especially on large scenes. Existing samplers suffer from average performances in terms of computation time and stability. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits Markovian properties of point processes to perform the sampling in parallel. This procedure is embedded into a data-driven mechanism such that the points are non-uniformly distributed in the scene. The performances of the sampler are analyzed through a set of experiments on various object recognition problems from large scenes, and through comparisons to the existing algorithms
TILDE: A Temporally Invariant Learned DEtector
We introduce a learning-based approach to detect repeatable keypoints under
drastic imaging changes of weather and lighting conditions to which
state-of-the-art keypoint detectors are surprisingly sensitive. We first
identify good keypoint candidates in multiple training images taken from the
same viewpoint. We then train a regressor to predict a score map whose maxima
are those points so that they can be found by simple non-maximum suppression.
As there are no standard datasets to test the influence of these kinds of
changes, we created our own, which we will make publicly available. We will
show that our method significantly outperforms the state-of-the-art methods in
such challenging conditions, while still achieving state-of-the-art performance
on the untrained standard Oxford dataset
Generating compact meshes under planar constraints: an automatic approach for modeling buildings from aerial LiDAR
International audienceWe present an automatic approach for modeling buildings from aerial LiDAR data. The method produces accurate, watertight and compact meshes under planar constraints which are especially designed for urban scenes. The LiDAR point cloud is classified through a non-convex energy minimization problem in order to separate the points labeled as building. Roof structures are then extracted from this point subset, and used to control the meshing procedure. Experiments highlight the potential of our method in term of minimal rendering, accuracy and compactnes
LOD Generation for Urban Scenes
International audienceWe introduce a novel approach that reconstructs 3D urban scenes in the form of levels of detail (LODs). Starting from raw data sets such as surface meshes generated by multi-view stereo systems, our algorithm proceeds in three main steps: classification, abstraction and reconstruction. From geometric attributes and a set of semantic rules combined with a Markov random field, we classify the scene into four meaningful classes. The abstraction step detects and regularizes planar structures on buildings, fits icons on trees, roofs and facades, and performs filtering and simplification for LOD generation. The abstracted data are then provided as input to the reconstruction step which generates watertight buildings through a min-cut formula-tion on a set of 3D arrangements. Our experiments on complex buildings and large scale urban scenes show that our approach generates meaningful LODs while being robust and scalable. By combining semantic segmentation and abstraction it also outperforms general mesh approximation ap-proaches at preserving urban structures
Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images
We introduce Hyper-Skin, a hyperspectral dataset covering wide range of
wavelengths from visible (VIS) spectrum (400nm - 700nm) to near-infrared (NIR)
spectrum (700nm - 1000nm), uniquely designed to facilitate research on facial
skin-spectra reconstruction. By reconstructing skin spectra from RGB images,
our dataset enables the study of hyperspectral skin analysis, such as melanin
and hemoglobin concentrations, directly on the consumer device. Overcoming
limitations of existing datasets, Hyper-Skin consists of diverse facial skin
data collected with a pushbroom hyperspectral camera. With 330 hyperspectral
cubes from 51 subjects, the dataset covers the facial skin from different
angles and facial poses. Each hyperspectral cube has dimensions of
10241024448, resulting in millions of spectra vectors per
image. The dataset, carefully curated in adherence to ethical guidelines,
includes paired hyperspectral images and synthetic RGB images generated using
real camera responses. We demonstrate the efficacy of our dataset by showcasing
skin spectra reconstruction using state-of-the-art models on 31 bands of
hyperspectral data resampled in the VIS and NIR spectrum. This Hyper-Skin
dataset would be a valuable resource to NeurIPS community, encouraging the
development of novel algorithms for skin spectral reconstruction while
fostering interdisciplinary collaboration in hyperspectral skin analysis
related to cosmetology and skin's well-being. Instructions to request the data
and the related benchmarking codes are publicly available at:
\url{https://github.com/hyperspectral-skin/Hyper-Skin-2023}.Comment: Skin spectral datase
Learning to Assign Orientations to Feature Points
We show how to train a Convolutional Neural Network to assign a canonical orientation to feature points given an image patch centered on the feature point. Our method improves feature point matching upon the state-of-the art and can be used in conjunction with any existing rotation sensitive descriptors. To avoid the tedious and almost impossible task of finding a target orientation to learn, we propose to use Siamese networks which implicitly find the optimal orientations during training. We also propose a new type of activation function for Neural Networks that generalizes the popular ReLU, maxout, and PReLU activation functions. This novel activation performs better for our task. We validate the effectiveness of our method extensively with four existing datasets, including two non-planar datasets, as well as our own dataset. We show that we outperform the state-of-the-art without the need of retraining for each dataset
[DEMO] Tracking Texture-less, Shiny Objects with Descriptor Fields
Our demo demonstrates the method we published at CVPR this year for tracking specular and poorly textured objects, and lets the visitors experiment with it and with their own patterns. Our approach only requires a standard monocular camera (no need for a depth sensor), and can be easily integrated within existing systems to improve their robustness and accuracy. Code is publicly available
A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images
We present a method that estimates in real-time and under challenging conditions the 3D pose of a known object. Our method relies only on grayscale images since depth cameras fail on metallic objects; it can handle poorly textured objects, and cluttered, changing environments; the pose it predicts degrades gracefully in presence of large occlusions. As a result, by contrast with the state-of-the-art, our method is suitable for practical Augmented Reality applications even in industrial environments. To be robust to occlusions, we first learn to detect some parts of the target object. Our key idea is to then predict the 3D pose of each part in the form of the 2D projections of a few control points. The advantages of this representation is three-fold: We can predict the 3D pose of the object even when only one part is visible; when several parts are visible, we can combine them easily to compute a better pose of the object; the 3D pose we obtain is usually very accurate, even when only few parts are visible